Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture
نویسندگان
چکیده
We believe that future many-core architectures should support a simple and scalable way to execute many threads that are generated by parallel programs. A good candidate to implement an efficient and scalable execution of threads is the DTA (Decoupled Threaded Architecture), which is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a hardware scheduling unit and relying on existing simple cores. In this paper, we present an initial implementation of DTA concept in a many-core architecture where it interacts with other architectural components designed from scratch in order to address the problem of scalability. We present initial results that show the scalability of the solution that were obtained using a many-core simulator written in SARCSim (a variant of UNISIM) with DTA support.
منابع مشابه
An On-Chip Multiprocessor Architecture with a Non-Blocking Synchronization Mechanism
tive to superscalar architectures [5][8][12][13]. Strengths of an on-chip MP architecture are threefold. First, an MP can exploit different level parallelism, thread-level parallelism (TLP), in addition to ILP. Second, the complexity can be suppressed using simple processors. This ensures a high clock rate. Third, communication latency can be significantly reduced using an on-chip network. Thes...
متن کاملE—A Language for Thread-Level Parallel Programming on Synchronous Shared Memory NOCs
As systems on chip are evolving to networks on chip (NOC) providing a unified communication infrastructure for a number of computational resources, being able to easily implement computational tasks as a parallel program that can be efficiently executed by multiple resources together is becoming increasingly important. Recent advances in thread-level parallel (TLP) architectures have made it po...
متن کاملA Medium-Grained Algorithm for Distributed Sparse Tensor Factorization
Modeling multi-way data can be accomplished using tensors, which are data structures indexed along three or more dimensions. Tensors are increasingly used to analyze extremely large and sparse multi-way datasets in life sciences, engineering, and business. The canonical polyadic decomposition (CPD) is a popular tensor factorization for discovering latent features and is most commonly found via ...
متن کاملCommunication centric, multi-core, fine-grained processor architecture
With multi-core architectures now firmly entrenched in many application areas both computer architects and programmers now face new challenges. Computer architects must increase core count to increase explicit parallelism available to the programmer in order to provide better performance whilst leaving the programming model presented tractable. The programmer must find ways to exploit this expl...
متن کاملBalancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture
This paper presents an approach to scheduling loops that leverages the distinctive architectural features of the XIMD, particularly the variable number of instruction streams and low synchronization cost. The classical VLIW and MIMD architectures have a fixed number of instruction streams, each with a fixed width. A compiler for the XIMD architecture can exploit fine-grained parallelism within ...
متن کامل